Fix single-event chunks missing when using bulk export API#62138
Fix single-event chunks missing when using bulk export API#62138
Conversation
There was a problem hiding this comment.
Thanks for the quick fix 👍
Could you edit the changelog to specify the impact of the bug (how many events), and which component they should update? Also I think we don't need to specify the event type as this bug affects all events.
e.g.
Changelog: Fixed an Auth Service bug causing the event-handler to miss up to 1 event every 5 minutes when storing audit events in S3.
There was a problem hiding this comment.
Ran the test without the fix and it does fail :)
=== RUN Test_querier_streamEventsFromChunk
querier_test.go:1043:
Error Trace: /Users/shaka/go/src/github.com/gravitational/teleport/lib/events/athena/querier_test.go:1043
Error: "[]" should have 1 item(s), but has 0
Test: Test_querier_streamEventsFromChunk
Is the number of events missed timeframe (1 event every 5 minutes) correlated to the uploaded chunk containing only 1 event? If not, would the changelog be misleading? |
The single-event chunk is the easiest case to reproduce the bug, but the reader seems to read the events 2 by 2. So if you have 11 events in a finished chunk, you would read 5 times 2 events, and the last read would return (1, EOF). So the last event will be dropped if len(chunk) %2 == 1. Assuming the parity of the number of events in 5 min is random, we are losing 1 event every 10 min, with a worst case scenario at 1 event every 5 min. |
* Fix single-event chunks missing when using bulk export API * Fix lint
Fixes #61729
When using the bulk export API, the event handler can miss certain events that are processed in single-event chunks. This includes
access_request.createandsession.uploadaudit events. With this fix, the export API now correctly processes single-event chunks so the event handler can see them.Big thanks to Forrest and Hugo for the detailed explanations and solution!
Manual Tests
Test: Event handler processes and forwards single-chunk events using bulk export API
access_request.createevent is forwarded to fluentd audit events endpoint (test.log)session.uploadevent is forwarded to fluentd audit events endpoint, and all session events (eg.session.start,resize,session.end) are forwarded to fluentd session events endpoint (session.*.log)changelog: Fixed an Auth Service bug causing the event handler to miss up to 1 event every 5 minutes when storing audit events in S3